Residual Advantage Learning Applied to a Differential Game

نویسنده

  • Mance E. Harmon
چکیده

An application of reinforcement learning to a differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual form of advantage learning. The game is a Markov decision process (MDP) with continuous states and nonlinear dynamics. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. On each time step each player chooses one of two possible actions; turn left or turn right 90 degrees. Reinforcement is given only when the missile hits the plane or the plane reaches an escape distance from the missile. The advantage function is stored in a single-hiddenlayer sigmoidal network. The reinforcement learning algorithm for optimal control is modified for differential games in order to find the minimax point, rather than the maximum. As far as we know, this is the first time that a reinforcement learning algorithm with guaranteed convergence for general function approximation systems has been demonstrated to work with a general neural network.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning Applied to a Differential Game

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a p...

متن کامل

Advantage Updating Applied to a Differrential Game

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual gradient form of advantage updating. The game is a Markov Decision Process (MDP) with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile a...

متن کامل

Multi-Player Residual Advantage Learning With General Function Approximation

A new algorithm, advantage learning, is presented that improves on advantage updating by requiring that a single function be learned rather than two. Furthermore, advantage learning requires only a single type of update, the learning update, while advantage updating requires two different types of updates, a learning update and a normilization update. The reinforcement learning system uses the ...

متن کامل

The Effects of Game-based Learning on the Grammatical Accuracy of Iranian High school Students

Teaching grammar has always been a problematic area of language teaching.  While teachers spend a great deal of time and energy to teach, the students are not eager to learn as they find it a real chore. This study compared two kinds of activities for teaching grammar: games and traditional exercises. It sought to discover the effect of games on the students’ grammatical accuracy. For this purp...

متن کامل

Exploring the Role of M-Game as a Seat of ESP Reading in the Iranian TVT

To direct m-game to be a possible didactic option for Iranian TVT (Technical Vocational Training) trainees, in this study m-game-mediated (Mobile Game-Mediated)  materials delivery was incorporated into the conventional teaching method in the blended ESP reading skill platform. So, 52 male trainees fromTechnical and Vocational College of Isfahan were selected by convenience sampling. Afterwards...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995